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Mental measurement in the #rmed Forces is en ebsolute necessity. 
Two interrelated existing vroblems must be resolved before mental 


measurement can be used most effectively. First, a deters:ination of 
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which jobs or occurpetions need to be Pilled in the Armed Forces is 
necessary, and, additionally, verformance inust be measured. The 
second problem is testing or mental measurement. Test development 
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must be centered around a job or occunation ora series of 


jobs or 
@ccuvavions and Corrolecved .  siisvs JOU Of OCcCUDEULLON Deri Ormance. 

A CGrusalenheview oO: the [iteravure on mililery too tins indaveaves 
that testing is being conducted without resolving the Tirst problem 
in any sound testing program. sAdditionally, there ere indicetions 
meat Correlation studies Of current: used tests are Prequently not 
eomouceved in an unbiased scientaiic mMenier.) High correlation coéfiicaeacs. 
no matter how they are onoteined, nave vossidly become the ultimate 


goal of military testing 
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CHAPTER I 
HISTORICAL BACKGROUND 


Military commanders have for many centuries judged the 
mental capacity of their subordinates. This was and is 
today one of the principle elements military commanders must 
consider when asSigning personnel. Selection and assignment 
of personnel during the nineteenth century, a period of 
relatively small regional wars, was based upon political 
considerations, judgement of mental capacities, technical 
know-how, and necessity in that order. 

Twentieth century global warfare required many more men 
on the battle field, each assigned so that his contribution 
to the war effort would be maximized. Military commanders 
found it impossible to evaluate millions of men on a personal 
judgement basis when the United States entered World War I. 
The American Psychological Association appointed a committee 
to study this problem. A method of determining the 
intellectual level of millions of men was desired. This was 
considered necessary for the rapid classification, training, 
and assignment of men to different types of service in order 
to save time and to properly utilize available human 
resources. 

Prior to World War I, Psychological testing had not, in 


general, attempted to measure individual differences for the 





purpose of personnel placement. Mental measurement tests 

had been developed for use in clinical examination of 
psychiatric patients. In addition, Cattel, in 1890, 
described a series of tests which were being used to 
determine the intellectual level of college students, 
although his ideas that "a measure of intellectual functions 
could be obtained through tests of sensory discrimination and 


reaction time" appears to be somewhat in error. 


A. BASIC MILITARY NEEDS 


A group test was desired to handle the classification 
problems of World War I. Army psychologists drew upon all 
available test material, much of which was unproven as to 
its usefulness, and developed the Army Alpha and Beta tests. 
These tests were well suited for group use, where only a 
very general classification was required. The Alpha test 
was designed for literate groups while the Beta was for use 
with those who were not literate in the English language. 
However, the Beta test proved to be less valid than the 
Alpha, but it was sufficiently discriminate for emergency 
use. An additional group test, the "Personal Date Sheet" was 


used in World War I to screen out those individuals with 


ine Anastasi Psychological Testing, The MacMillan 
Company, New York, 1555 Eads 





Psychological difficulties. Psychologists also developed 
tests of special or specific abilities that proved to be 
moderately useful. Army psychologists in collaboration with 
their civilian contemporaries were able to develope a group 
intelligence test that contributed immeasurably to the 
solution of the emergency classification problems and 
ultimate success of the war effort. 

Group tests developed by the Army in World War I were 
eagerly accepted for civilian use after the war. Group tests 
became the panacea in personnel selection and placement, 

This movement encompassed peoples of all ages and groups. 
Studies of special groups were undertaken for various reasons. 
The use of group tests became indiscriminate and when the 
results failed to meet expectations much hostility and 
skepticism developed. Much of this hostility and skepticism 
is still present, one and a half generations later, for 


reasons which were and may still be well founded. 
B. ADVANCES IN MILITARY TESTING 


During the interim between World Wars I and II, and the 
advent of group testing, many advances were made in the use 
of mental measurement tests. The Navy's Bureau of Navigation 
organized a personnel testing program as a part of its 
training division in 1924. A General Classification Test 


was used at training stations to select enlisted men for 





Navy schools. Later, this same test was used as a screening 
device at recruiting stations. Other tests were also 
introduced and by December 1941 the following tests were in 
seneral use at recruit training stations: "General 
Classification, Mechanical Aptitude Test, Arithmetic Test, 
English Test, Spelling Test, and Radio Aptitude Test, ''¢ 

These tests had served well during peacetime when the 
ratio of selection to applicants for Naval service was 
rather low. However, when this ratio was raised as mass 
mobilization became a necessity, the tests were found to be 
grossly inadequate for selection purposes. There was little 
differentiation between good men and their capabilities in 
various rates, Training schools found that many men enrolled 
had little capability in their assigned specialty. Local 
testing programs developed at many stations in an attempt to 
overcome these difficulties. 

By May 1942, the enormity of the personnel testing 
program and its concomitant problems was recognized. A 
request for assistance was made to the Office of Scientific 
Research and Development. As a result of this and other 
developments, a two pronged attack was launched to resolve 


the problems related to testing as soon as possible, 


eaenconiel Research and Test Development in the Bureau 
of Naval Personnel, Ed., Dewey B. Stuit, Princeton U University 
Press 1947, p. 6. 





The problems were essentially divided into two parts. 
First, since the present tests revealed a lack of validity, 
tests had to be developed which were valid. Second, little 
was known about the requirements of Navy training on a mass 
basis. Knowledge concerning the second aspect of this 
problem was necessary before the first part could be resolved. 
This emergency was met in much the same fashion as the 
identical problem of selection, classification, and placement 
was met in World War I. Both the Army and Navy faced these 
problems and, as before, psychologists and personnel 
officials of both services pooled their resources, procured 
Civilian assistance and civilian tests, and tests began to 
improve in validity and continued to do so after late 1942 
for the remainder of the war. 

Following World War II, in 1946, a permanent research 
organization was approved to:4 

Undertake a coordinated program of personnel research 

and test development centered around the major 

personnel problems of the NAVY...to conduct studies 

on personnel, policy, techniques, and procedures, and 

on the assignment, evoluation, promotion or advancement, 

and morale of officer and enlisted personnel: ...and to 
develop such psychological and educational tests and 
other instruments as may be necessary for the selection, 


classification, training, and evaluation of performance 
of Navy personnel. (3) 


Jruture wars may not require such mass mobilization, and 
in any event will surely not allow sufficient time for the 
construction of tests which are valid enough to be used as a 
reliable guide for the selection, placement and training of 
personnel. 


4Stuit op. cit. p.ll 








This organization has had various titles over the last 
eighteen years and has made many recommendations for 
improving the Navy's personnel administration. Fund limit- 
ations, as well as opposition to change, has prevented full 


implementation of the program and its recommendations. 
C. SELECTED NAVY TESTS 


As a result of the groundwork laid during World War IT, 
and the permanent organization established as the Personnel 
Research Division of the Bureau of Naval Fersonnel shortly 
after the war, many methods of mental measurement have been 
devised. This discussion will be limited to those tests 
considered basic with mention of other special Navy tests. 
Enlisted Basic Test Battery 

The General Classification Test (GCT) 

oveiS a 100-item test designed to measure the ability 

to comprehend material of a verbal nature. The 40 

sentence-completion items and 60 verbal=analogy items 

which comprise the test are arranged in order of 
increasing difficulty. The testee is to select the 
one most correct answer from the five possitle answers 

7 are given. A time limit of 35 minutes is used. 

This test, like all Navy tests designed for mental 


measurement, is standardized and differences can be readily 


established. Appendix A illustrates the comparison of the 


anegelepmnent and Standardization of the U. S. Navy Basic 
Test Battery, Form 6, Bureau of Naval Personnel Research 
Report 58-2, U. S. Naval Personnel Research Field Activity, 
San D3 


lego, Calif. Nov. 1958, p. l. 
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Navy's standard T-scores with Z=-scores, Stanines, and IQ 
scores. All of these different methods of scoring are based 
on a normal probability curve. 

The Arithmetic Test (ARI) is designed as, 


oeetWo separately-timed subtests, a 20-item Arithmetic 
Computation Subtest and a 30-item Arithmetic Reasoning 
Subtest. Both kinds of items are in five alternative 
multiple choice form. (6) 


Time limits are also established for both of these 
subtests and are 12 and 35 minutes respectively. 
A Mechanical Test (Mech) is designed as, 


oootwo separately-timed 50-item subtests, Tool Knowledge 
and Mechanical Comprehension. ...time limits are 10 
minutes for the Tocl Knowledge Subtest and 25 minutes 
for the Mechanical Comprehension Subtest. 

Each tool knowledge item consists of five pictures of 
mechanical or electrical tools or equipment. The testee 
is to select from the last four objects pictured the one 
which is most closely associated with the tool or 
object in the first pLceures£ ac 6 ois 0 015 6 oreio eiweolo - cenauannte 

Each mechanical-comprehension item consists of one or 
more drawings in which a mechanical problem is presented. 
The testee is to show whether he understands the 
mechanical principles involved by marking one of the three 
possible answers provided. (7) 


The fourth and last of the Navy's Enlisted Basic Test 
Battery is the Clerical (CLER) which is designed to 

oo measure the ability to observe quickly and accurately, 

consists of 240 pairs of five-to=nine-digit numbers which 

must be compared at a high rate of speed. The examinee 

indicates whether the two members of the pair ere the 


same or are different by marking an "5S" or "O" in the 
adjoining answer space. 


SIpid. pe. 2 
7tpid. pp. 2=3 
Sthid. pp. 3-4 





Other Navy Enlisted Tests 


In order to obtain supplementary information necessary 
for proper classification of Enlisted personnel several 
special tests have been divised. These tests include but 
are not limited to the following: (9) 

1. An Blectronics Technicians Selection Test 


» Radio Code Test 


.. oonar Pitch Memory Test 


EZ 
3. Telephone Talker Test 
ur 
5 10 


» Navy Literacy Test 

6. Non-verbal Classification Test 

One other test, perhaps the most important of all Navy 
Enlisted Tests, is the Advanced Technicians Test. Because 
of the increasing complexity of today’s scientific and 
technological requirements more effective screening methods 
are provided for advanced technical training by the use of 
a new test. This test, The Advanced Technicians Test, 
consists of four parts, Reading Comprehension, Mathematics, 
Physics, and Electricity. 

The Advanced Technicians Test does not replace the basic 
test batteries. It is given to enlisted personnel in second 
or subsequent enlistments and results recorded on page 3 of 


the service record and on the Bureau of Naval Personnel 


7Information and Educetion Manual, NavPers 16,963D, 
Bureau of Naval Personnel, Aug. 1955, p. 70. 


l0Onnis test is designed for those who cannot read English. 
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Enlisted Master Tape. Tests are given at designated Enlisted 
Classification Units and retests are not authorized. Tests 
are to be administered to the following categories of 
personnel not previously tested: (11) 

1. qualified submariners 

2. less than 12 years services 

3. all others who desire to be tested 

Lo all first reenlistees 

5. all applicants for nuclear power training 
Basic Tests for Naval Officer Personnel 

The need for an intellectual screening device for officer 
personnel was extremely urgent owing to the necessity for a 
rapid expansion of the Navy early in World War II. In an 
effort to meet this urgent requirement, two groups of tests 
were developed in 1942 and 1943. These tests were to be 
given and used as a part of an initial screening of applicants 
and as a classification tool after applicants had been 
processed beyond this initial stage. 

Basis among the tests developed was the Officer 
Qualification Test. This test, still used with modifications, 
consisted of three parts--vocabulary, mechanical comprehension, 
and arithmetical reasoning. It was felt that independent 
verbal, mechanical, and arithmetical abilities were indicated 


by factorial analysis. 


lipureau of Naval Personnel Instruction 1236.2,20 dune 1941 
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The vocabulary portion of the LOO question test 
consisted of 50 opposite items where the testee was to 
select a word from among five that was nearest to being the 
opposite of a stimulus word. The Arithmetical portion of the 
test consisted of twenty questions having five choices for 
each question. The Mechanical Comprehension test completed 
the Qualification Battery. This subtest consisted of thirty 
items illustrating mechanical elegans about which a 
question was asked and an answer chosen from three alternatives. 
Sixty minutes was allowed to complete the entire battery with 
recommended times for each section. 

An Officer Classification Test was developed to 
differientiate among officers in order that assignments could 
be made to specific duties with a minimum cf misplacement. 


This test battery is composed of five sections as follows: (12) 


I, Verbal Reasoning Test 75 Liveechoice analysis items 
II. Mechanical Comprehension 48 fiveechoice mechanical 
Test comprehension items 
III. Mathematics Test 50 ee ee mathematics 
IV. Relative Movement Test 50 four=chiains relative 


movements items 
Vo. Spatial Test 


A. Block Assembly 30. four~choice block 
assembly items 
B. Block Rotation 30 five-choice block 


rotation items 


a2 
SEVIS, OPsuCiite pe LOS. 
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CHAPTER II 
CURRENT USES OF TESTS 


Tests as mental measurement devices are currently 
enjoying widespread use. The majority of psychologists and 
many personnel administrators feel that mental testing has 
attained its majority and is moving toward a yet to be found 
maturity. Many others feel that mental testing has attained 
its long sought maturity and that the proper use of mental 
tests will benefit mankind immeasurably. A few professionals 
in the mental testing field feel that testing is still in 
its infancy and that test results furnish only a sample of 
individual capacities. These few professionals question 
mental testing and ask if mental tests do give a fair and 
effective measure of a person's intelligence, aptitude, 
knowledge or ability to think. 

A listing of tests currently in use with a brief 
description of each would fill several volumes. Even a list 
of companies which furnish testing services would be rather 
extensive. However, the five giants of this industry are 
(1) Educational Testing Service, Princeton, New Jersey, 

(2) Psychological Corporation, New York, N. Y., (3) Harcourt, 
Brace, and World, Inc., (4) California Test Bureau, Los 
Angeles, Calif., and (5) Science Research Associates, Inc., 


Chicago ) Til ° 
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A. MILITARY APPLICATION 


Each of the armed services has tests which attempt to 
measure intelligence, aptitude, knowledge, and ability to 
think. Tests used by the different armed services are 
similar and each service uses similar procedures. However, 
since each service considers itself unique and therefore 
has unique requirements, each has its own tests. 

These tests are of the pencil and paper type and are 
considered to be an essential part of the selection and 
classification procedures as previously noted, Tests are 
usually given early in an individual's military career. 
Results are tabulated mechanically and posted in the member's 
Service Record. Re-tests are allowed if it can be shown by 
the testee that he was under some severe handicap at the time 
of the original test. 

Tabulated test results are used to determine how and in 
what manner the newly inducted service members can make their 
greatest contribution to the mission of their particular 
service. 

Uses of Navy Enlisted Tests 

Test results are first used in the case of Naval 
enlisted personnel at Recruit Training Commends. During 
their period of recruit training but after the basic test 
battery has been scored and entered in their personnel 


record, they are interviewed individually by military 
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personnel who are alleged to be qualified in personnel 
placement work. At this interview, careful consideration is 
given to the incividual's personal preference, test scores, 
Civilian work experience, motivation, previous training, and 
general interests. The recruit is then given his first 
classification.?? 

At this time recommendations for Class A training are 
made, altnough assienment to schools for all recruits 
classified as eligible is frequently impossible due to peak 
recruit inputs, service needs, and school capacities. 

Also in the past, recruits who scored low on their 
basic test battery were recommended for administrative 
discharge at this point inasmuch as they were deemed not 
capabie ot being trained to fill Navy billets. However, this 
has been partially corrected by more adequate testing prior 
to enlistment which is designed to weed out individuals 
whose ability to read and write is suspect. 

The Navy's Bureau of Naval Personnel has established 
minimum cutting scores on the basic test battery for many 
occupations requiring formal training. These minimum scores 
are widely disseminated throughout the Navy and individuals 
who have attained the required status as indicated by their 


scores are elegible to apply for formal training after they 


Classification interviews are occasionally carried out 
in a perfunctory manner, allowing less than ten minutes per 
interview. 
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have left recruit training. Individuals who apply for 

Class A formal training after recruit training are subject to 
handicaps, but channels are available to overcome these 
impediments if the qualified person is persistent. 

One of the most severe handicaps is the constant shortage 
of on board personnel in commands afloat and ashore. This 
shortage has occasionally been justified, but more frequently 
it is the result of local provincialism. Whatever the reason, 
individual requests from qualified personnel frequently never 
leave their respective commands. 

Another handicap frequently encountered in the field is 
a basic misunderstanding as to what test scores mean. For 
example, a department needs another striker, and a sailor is 
selected based upon rather general criteria. One of the 
most heavily weighted factors considered is his basic test 
scores. If the newly acquired striker learns his new job 
rapidly, he is considered to be a fine fellow and his test 
scores were an excellent predictor of his success, If the 
striker learned the job slowly or not at ali, he is considered 
a smart never-do-well and is promptly labeled as such. This 
label, which is informally spread throughout the command, 
adheres to this person without discrimination. He is seldom 
afforded another opportunity to strike for an occupation 
where his talents could be utilized. In the event this second 


hypothetical sailor has sufficient obligated service when a 
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must=quota for school is sent to the command, he will 
probably be made available, but in the meantime he may have 
become firmly convinced that the Navy is no place for him. 

The Bureau of Naval Personnel periodically issues 
Notices to field activities stating personnel requirements 
and requesting applicants for various schools. Each Notice 
or one of its references contains minimum test score 
qualifications for applicants. Continuing requirements are 
issued in the form of instructions. A limited number of 
instructions and contents are outlined below: 

This instruction concerns the Navy Enlisted Scientifie 
Education Program (NESEP), The (NESEP) is an uninterrupted 
four-year college educational program Lending to a 
baccalaureate degree in major fields approved by the Chief, 
Bureau of Naval Personnel. Upon graduation enlisted personnel 
are ordered to Newport, Rhode Island or elsewhere for Officer 
Candidate School, (OCS), Upon successful completion of (OCS), 
students are commissioned in the Regular Navy. Eligibility 
requirements include a combined GCT and ARI score of 118, 
This ensures the Navy that the chances of an individual 
succeeding in this program are excellent. Other screening 
examinations are given and all necessary precautions are 
taken to ensure that applicants are properly motivated, 


Similar prerequisites are set forth in the Bureau of Naval 
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Personnel Notice 1531 of 28 June 1963 for military academy 
applicants. 

Bureau of Naval Fersonnel Instruction 1500,150 

This instruction concerns the selection and training of 
candidates for diving duty. The mental requirements for 
selection to this program are listed as desirable and consist 
of a combined ARI and MECH score of 105, 

Similar prerequisites are established for Electronic, 
Clerical, and other occupations. Prerequisites are geared to 
the level of skill deemed necessary to successfully function 
in the particular occupation. However, Commanding Officers 
may request test score waivers in meritorious cases where it 
is believed that a candidate does possess the necessary 
capacity for training and that this capacity is not reflected 
in his test scores, 

Eiforts by the Chief, Bureau of Naval Personnel to 
establish minimum prerequisites necessary for an individual 
to attain proficiency in many areas have been extremely 
successful as can be judged by correlation studies between 
test scores and grades of students completing courses of 
study. However, it should be emphasized that these studies 
included only those students who graduated and did not 
include those who were disenrolled for various reasons or 
eiven certificates of completion. The prerequisites were 


established to reduce costs and to increase the level of 
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training in the Navy. It can be safely assumed that these 
two objectives have been partially met. 

Efforts have also been made by tne Chief, Bureau of 
naval Personnel to enforce compliance with established 
prerequisites. Bureau of Naval Personnel Instruction 1510.7 
of 20 Oct. 1952, which is still in effect, noted that 
excessive numbers of ineligible candidates were being 
received at enlisted service schools. This instruction 
directed the attention of all commanding officers to the 
problem and further directed strict compliance with current 
directives. Failure to meet minimum basic battery test 
scores was listed as one of the most frequent errors causing 
candidates to be ineligible. 

Uses of Navy Officer Tests 

The tests designed for use in selecting and classifying 
officers were outlined in Chapter I. These devices were 
used during World War II. The Officer's Selection Battery 
served to screen out personnel who were not deemed to be of 
officer potential and was given to practically all officers. 
The Officer's Classification Battery was not administered 
to all officers and many officers have never taken this 
series of tests. A survey of ninety-four officers having 
from five to eighteen years service at the U. 5. Navy 


Postgraduate School in 1962 revealed only twenty-one officers 
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] 
Be An estimate of the number of 


who had taken these tests. 
officers who have taken the classification battery is not 
available, but these tests have been regularly administered 
to all newly commissioned officers since about 1951. 

A search of publications, records, and regulations at 
the U. S. Naval Postgraduate School does not indicate that 
the results of the Officer's Classification Battery test 
scores are enjoying wide use. 

The Officer's Selection Battery is enjoying wide use, 
All applicants for commissioned status are given this 
battery. It is given at officer procurement centers and is 
.piven annually to inservice applicants. This test battery is 


an extremely basic instrument and furnishes little information 


in addition to that required for acceptance or rejection. 


ayy J. Martz and T. E. Rushin, "Determination of Vaiid 
Criteria for Selecting Postsraduate Management School Candidates 
on the Basis of Established Academic Performance and Various 
Aptitude Tests" (Unpublished research paper, U. S. Naval Poste 
graduate School, Monterey, 1962), p. 11. 
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CHAPTER III 
LIMITATIONS OF TESTING 

The limitations of testing in determining mental level, 
eeneral aptitudes, and various personality characteristics 
of individuals are in the author's estimation presently 
uncountable, All of these limitations cannot be overcome in 
the foreseeable future. However, they may be reduced to a 
respectable level if due recognition is given to the facts. 

A limited search of the literature in the field of 
testing has not revealed a common concise definition of 
intelligence or intellect, although many testers and 
psychologists claim that this is what they are measuring. 
Then we can only conclude that intelligence is what inteiligence 
tests measure. If this definition of intelligence is 


7 
ae 


accepted, no satisfactory test of ability tc iearn will ever 
be developed. Tests currently measure intelligence by the 
Sample technique, i.e. a performance sample is taken under 
standardized conditions. This sampie has actually been taken 
from the achievements of the individual. It has not measured 
his ability to learn which may be far above or far below his 
level of achievement as revealed by the sample, 

Another limitation of testine is the use of test results. 


An example previously given concerning the selection of a 


Striker is one misuse that could be easily corrected. If 
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test results are treated as a final measure of ability or 
aptitude, then the test is being misused, because tests 
cannot furnish us with an absolute numerical measurement of 
the individual. 


A. TEST CONSTRUCTION 


Test construction is a long and arduous process. A 
decision must first be made concerning the purpose of the 
test, i.e. what abilities, proficiencies, or aptitudes are 
to be measured. In order to do this, the test maker must 
have a knowledge of the requirements of the particular 
functions which the testees will perform, He must then 
analyze the component abilities, proficiencies, or aptitudes 
which are necessary to perform the stated function. The 
test maker then prepares a large number of questions, almost 
always of the multiple choice variety for intelligence test, 
to be used in the initial stage. Then a weeding-=out process 
begins. The test maker may reject many of the questions and 
reword others at this stage. 

The surviving questions are then "pretested" on people 

comparable to those for whom the test is intended, and 

a statistical dossier is compiled for each question. If 

a question is answered correctly mainly by the "better" 

examinees it is a good question. fit is answered 

correctly mainly by the "poorer" ones it is a bad 
question. If a fair number of the "better" examinees 
favor one answer and a comoarable number favor another, 


the question is probably ambiguous.. If everyone gets 
it right, it is useless. And so on, 
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In the light of pretest statistics, still further 

questions are rejected or rewritten, and ultimately a 

rigorously screened version of the test emerges. It is 

now ready to be given to the people for whom it was 
constructed. ...The test is given a preliminary try 

out and the results receive elaborate statistical 

analysis. 

At the time when original construction begins the test 
maker decides what salient cheracteristics testees must 
possess in order to perform the job for which the test is to 
be given. These characteristics may vary in quantity, but 
are usually small in number. Original questions are selected 
to measure each of these characteristics and hopefully 
through the above quoted procedure the finished test in its 
smooth form will furnish the test user with sufficient 
information which will allow him to make a better personnel 
cecision than he could have made without the test. 

In constructing the test, every possible aspect has 
been standardized. Standardized time, room temperature and 
lightine are desireable. Timing is considered particularly 
important inasmuch as this helps to weed out testees who are 
not familier with the subject matter of questions and are 


slow in coming up with an answer. This also saves time on 


the part of the tester and the testee, 


Banesh Hoffman, "The Tyranny of Multiple-Choice Tests", 
Harper's Magazine, CCXXII No. 1330 (March, 1961), p. 38. 
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Another aspect of standardization in tests is the answers, 
Questions of the multiple choice variety usually request that 
the correct answer be chosen out of 3, 4 or 5 possible answers 
or that the best answer be chosen by the testee. When the 
test is pretested, standardized answers are selected by the 
test maker on the basis of the most successful examinees 
answers, Answers thus obtained are, of course, subjected to 
the most severe statistical analysis. Assurances can then 
be given without reservation that the standardized answer for 
each question is significant at a particular level. Since 
we have predetermined answers for questions, test grading is 
avery simple matter requiring no judgement. Where large 
numbers of tests are involved, grading by machine is the 
least expensive and most accurate method of determining test 
scores. This is true of all "objective" type standardized 
tests. 

Objective type multiple choice tests ere generally 
thought to be of very high caliber inasmuch as the margin for 
human error has been largely removed from well constructed 
tests. This is a possible error in test construction, 
Individuals who take tests can only answer questions sub- 
jectively, i.e. within the framework of their own experiences, 
achievements and judgement. Mass testing with predetermined 


answers accentuates previous experiences and achievements. 
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Little stress is placed on the judgement of the individual, 
Justification for this emphasis on experience and achievement 
is readily apparent. The indivicual with the desired level 
of intelligence or judgement will have had experiences 
Similar to others in society. Therefore his current judgement 
or reasoning ability is a direct result of his past achieve- 
ments and experiences. If he has no experience in an area 
being tested, he will be scored low by the machine, because 
he was not standard and didn't produce standard answers. 

In some cases the testee is penalized for using 
judgement. An example of a sentence completion item from the 


Navy's General Classification Test will reveal this, 


A good sailor will the orders of his superior 
officer. 
(A) see 
(B) fear 
(C) question 
(D) obey 
(E) change 


It is presumed that this question is no longer in use, 
if it was ever used, but it is felt that it is representative 
of many completion, (choose the best answer), type questions. 
This question or a Similar question is administered to 
recruits after a period of recruit training. During training, 
conformity in thought and actions is a desirable behavior 


pattern. Lectures laud the life of a good sailor and the 
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Research Report 58-2, loc. cit. 
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merits of obedience are extolled. Movies of the Navy's 

ereat accomplishments are shown and to each order issued by 
a superior officer, good sailors have resounded with an 

"Aye, Aye, Sir" meaning that the good sailor has received 
the order, he comprehends the order, and he will obey the 
order. The key word then becomes "obey", but is this the 
desired objective answer? Most sailors will undoubtedly choose 
"obey" to complete this sentence. A few will choose answers 
B, GC, and E because they don't understand the question, or 
they have psycholorical incapacities. But the sailor who is 
attempting to use judgement is at an impasse. He knows that 
a good sailor must receive an order, comprehend it, and then 
obey it. This sailor knows that the statement implies orders 
have been issued. Further he knows that to see cr comprehend 
an order is an absolute prerequisite to obeying. Should a 
good sailor comprehend all orders given by superiors or 
should a good sailor obey orders received without question, 
whether he understands them or not? He may follow a lovrical 
sequence and give see as his answer or he may try to figure 
our what answer the test maker wants and give obey. In 
either case he has been left far behind other testees and 


weet : ‘ i 
may not finish the test in standard time, 


L7tne objective answer to this question is unknown. 
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Other examples of objective type questions that might 
be deemed confusing, vague, misleading or embisuous can be 
found in many tests in use today. The California Survey of 
Mental Maturity, Form I, is considered by many authorities 
to be a test of considerable merit. This is a multiple 
choice objective type test divided into language and non= 
language sections with several subsections in each section. 
One subsection of the language section on page 5, left 
column, states: "In each row, there is one picture that shows 
something which is the opposite of the first picture. Mark 
its number. (Items 23227) "18 Question number 25 gives a 
picture of falling rain in a wooded area as a first picture 
and as its possible opposites, there are pictures of (1) 
an exploding stick of dynamite: (2) a geyser spewing into 
the air; (3) a water fountain sprinkling water into the air: 
and (4) a mountain stream. The correct objective answer to 
this question is number 4 possibly because the test makers 
thought that a mountain stream was not violent. A non-random 
sample of seven testees of high intellect chose number 1 as 
the correct answer and all because the other three choices 
conteinec moving water. 

The language section of this test contains a question, 
number 2, considered to be more defective than number 25. For 


iby 


this subsection, instructions tell us to: "Mark the number 


18) W. Clark, et al., Survey of Mental Maturity Form 1 
(Los Angeles: California Test Bureau, 1959) p. 5. 
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of the word that means the same or about the seme as the 
first word." The first word listed is oppress. The possible 
choices are listed es: promise, imitate, crowd, and burden. 
It was not deemed necessary to test this question. In this 
case, a review of a current English dictionary by the test 
makers would have given cause to remove the question from 
the test. Both to crowd and to burden are iisted as correct 
meanings for oppress, ~° However, the objective answer to 
this question is burden. 

For an excellent analysis of multiple choice questions 
with objective answers, readers are invited to consult 
The Tyranny of Testine by Banesh Hoffmann. Dr. Hoffmann has 
made a comprehensive study of testing and estimates that as 
many as 5 percent of the questions used in our best tests are 
defective. He has taken an analyticel approach in his 
study that may help to improve testing. 

B. RELIABILITY OF TESTS 


It is often stated by test makers thet a test cannot be 
valid unless it is realiable. Reliability, quite simply, 
refers to consistency of results. "In theory if an individual 
were to take a test three or four times he would answer each 


question the same way and would come up with the same score, 14! 


206, fT. Onions (ed.), The Oxford Universai Dictionary 
(New York: Rand McNally & Company, 1955), p. 1377. 


elRossall J. Johnson, Personnel and Industrial Relations 
(Homevrood, Ill.: Richard D. Irwin, Inec,, 1960), pp. 50-51. 
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This is ideal realiability and unforunately seldom hanpens,. 
In fact, scores usually improve each time a test is taken 
and may improve considerably if the individual has taken other 
tests of the same type, even on subject material unrelated 
to that of the original test. This is the testeretest method 
of measuring reliability and it is not often used for reasons 
which are obvious from the above discussion. However, memory 
traces which cause improvement each time e test is taken tend 
to fade with time and better reliability is found in using the 
test-retest method when a time span is allowed between tests. 
Since individuals are continually learnine, the time span 
allows a person to accuire new knowledce which interferes with 
our reliability test. For all of these reasons, the test= 
retest method is less than satisfactory. 

Two other methods of testing for reliability commonly 
in use are the equivalent form and split halves methods. in 
the equivalent form, two tests are developed of equai difficulty 
covering the same subject. If these tests are identical, 
scores on the tests are identical. Since any two questions 
are never identical, reliability must be estimated, but fair 
estimates can be obtained in this manner. The split halves 
method is a variation of the equivalent form method. In the 
latter method two tests are developed, while in the former, a 


single test is split into two parts of equal difficulty and 
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the halves are measured against each other and then by 
statistical massage test reliability can be determined. The 
Navy uses both of these methods for developing test 
reliability data by using each method singly or combining the 
two methods in some cases. 

G. VALIDITY OF TESTS 

Previously it has been noted that the test maker must 
determine what component abilities, proficiencies and 
aptitudes must be possessed to perform a given function. 
Tests are considered valid if they measure these components 
accurately. However, in addition, before tests are 
considered valid, it must be proven that the original analysis 
is correct, i.e. a test which measures verbal ability may 
have high validity in measuring verbal ability, but the same 
test may have very low or zero validity when ccrrelated 
with job success. 

The ideal is seldom found and tests are considered 
beneficial if a positive correlation exists. The greater the 
coefficient of correlation, the better the test. Frequently 
very low correlations are sufficient to weed out personnel 
who are obviously not oualified to perform a given function. 

Determining test validity is an extremely difficult 
task. First a job analysis is necessary, then the criteria 
for success must be established. Once the criteria has been 
established, a gradings or scale for assessing job success for 
each indivicual is necessary. Only then can the validity of 


a test be found by matching job success with test scores. 
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CHAPTER IV 
NECESSITY FOR TEST IMPROVEMENT 


Testing as a means of mental measurement can be an 
exceltent tool in our military arsenal. it is not sufficient 
to maintain this tool in a static state when conditions and 
needs are dynamic. The present state of testing can be 
likened to a ship which was built in 1944 and has been kept 
in an excellent state of repair. In many respects this ship 
can still fill a vital role in the Navy’s mission just as 
testing assists in classifying, training, and placing personnel. 
Improvements have been made in old weapon systems and new 
ones, through research, have been developed. Old tests have 
been improved; at least statistics tell us that test 


reliability and validity is imovroving with each test revision. 


A. CONSTRUCTION 


Test construction has previously been discussed in broad 
outline. Several defects have been pointed out and other 
defects implied. These defects in total, if Dr. Hoffmann’s 
estimate can be accepted, would allow the less intellirent 
individual with superficial knowledge to obtain a raw score 
five percent higher than his more intelligent contemporary, 
although the probability of an extreme of this sort is 
quite low and waivers can usually be obtained if an applicant 


for any program has persistence. However, tne applicant's 


ag 





persistence must not waiver while he is convincing his 
Division Petty Officer, Division Officer, Department Head, 
and Commanding Officer of his sincerity. 

Questions which are constructed using flawless grammer 
with the best answer requested from a choice of several 
alternatives are at best suspect when more than one correct 
choice may be interpreted. There is considerably certainty 
that we will obtain answers to questions of this type which 
show us a normal distribution, i.e. after the auestion has 
completed the cyclical test for reliability. This distribution 
is obtained through careful study cf answers given and answers 
certainly reflect the experience, education, and achievements, 
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or lack of same factors, of persons answering the questions. 
The author has been unable to locate anv relevant studies 
which attempt an analysis as to why distracter answers are 
chosen by testees taking multiple-choice objective type 
tests or for that matter why the objective answer is chosen 
by testees. It is felt that this information is an 
absolute necessity before questions of this type can be 
clearly evaluated and used as a measuring device. 

There is, of course, no excuse for constructing questions 
which tend to mislead. This is a favorite method of many 


college professors who test for rote memorization. This type 


of question is not only incorrect, it is a discredit to the 
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intellect of man. In the category of questions which tend 
to mislead, we must include all questions which are vague, 
ambiguous, and in general tasteless. Questions of the 
misleading variety may serve a valid purpose when used by 
experts in individual testing, but their usefulness is 
marginal when in sroup testing we attempt to measure the 
intellect of a particular individual, although tests of this 
type are beneficial if it is desired to measure one’s 
ability to detect flaws in construction. 

There are very few questions of the types described 
above in use by military testers, but any is too many. These 
questions are a twofold detriment to sound testing because 
the testee must first determine among many variables what 
the question is requesting and then select an answer from 
several possible objective answers. The total possibilities 
in a poorly constructed guestion can be astronomical in 
number. Perhaps probabilities could be assigned to each 
possibility, but this would bring the testee no closer to 
comprehending the question than before and his answer would, 
largely, still be left to chance. 

B. VALIDITY 


It is presumed in this section that military tests are 
well constructed. They are highly reliable and reliability 


tests show a correlation coefficient of .&0 or greater, 
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The concept of validity is crucial to any testing program. 

If a test is perfectly valid, it has a correlation 

eoeiticient of plus 1, ice. the lever of performance vor 

each worker is identical to his test score in relation to 

the group being tested. Perfect validation is illustrated 

in Appendix B, At the other extreme, a test may have a 
perfectly negative correlation coefficient of minus 1 where 
the individuals who obtain the lowest scores are the best 
workers. This is also illustrated in Appendix B, In the 
event there is no relationship between test scores and work 
performance, a zero correlation coefficient, also illustrated 
in Appendix B, is said to exist. Tests with a zero correlation 
coefficient are considered to have little merit, while those 
having a positive or negetive correlation can be used, 
However, in chooSing workers by using a test having a negative 
correlation with job success, it must be remembered that 

low scores mean that the worker will be a success on the job 
for which the test was developed. 

The Navy's studies of test validity have been quite 
extensive within a limited range. Available studies indicate 
that the area of coverage has been limited wholly to 
academic performance. This has been necessary because the 
Navy has not yet developed an adequate system of rating 


officers and men in job performance outside the training 
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area, Some controversy erupts from time to time between 
proponents of various rating methods however, but no system 
has been officially adopted which even purports to solve 
this conumdrum, Therefore, since all military personnel 
receive some training, the criteria for success hinges on 
academic performance in training assignments. 

Stuit's?®* studies of test validity show a positive 
correlation between scholastic achievement and test scores 
for most Navy tests used in classifyins both officers and 
enlisted personnel in World War II. He succinctly points 
out instances of negative correlation, but these are small 
in number and can be disregarded. For the most part 
correlation coefficients fell in the range 210 to 3 /Ci ay 
coefficient above .60 is considered very high. 

A more recent study of test validity revealed that there 
was a significant positive relationship between the (BTB) 
for enlisted versonnel and final grades attained at class A 
and class P Navy schools.*? Various combination of the Basic 
Test Battery scores were used in this study. These samé 
test score combinations had previously been used in assigning 


personnel to school. 





oe 
otuit, op. cit., et passim. 
23Research Revort 57eL, NAVPERS 18344A, Revised Edition, 
Personnel Measurement Reseerch Branch, Personnel Analysis 
Division, Bureau of Naval Personnel, April 1957, e% passim. 
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In general, service schools in the Navy are under the 
control of the Bureau of Naval Personnel. Service schools 
are established as satellite commands in a larger complex, 
independent commands, and as school commands where schools 
of several types are established under one commanding officer. 
Training programs are established by the Bureau of Naval 
Personnel in conjunction with a technical bureau having 
primary responsibility in the area concerned. The Navy's 
need for training personnel is determined by the Bureau of 
Naval Personnel again in conjunction with the technical 
bureau concerned. Quotas are established and personnel are 
selected and assigned to the various established schools. 
These assignments are based upon service needs, test scores, 
and individual preference. Training commands, at this 
point, have an approved training program and trainable 
students and these commands are expected to train and 
graduate men who are capable of performing technical service 
in today’s Navy of ever increasing complexity. 

There are indications that service schools labor under 
some handicaps in fulfilling their missions. Standards must 
be set as a goal for students. At the same time personnel 
requirements must be considered, so standards must not be 
too high to prevent the required number from completing 


training. Standards among schools training personnel for 
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the same technical specialty do not vary, but commanding 
officers who apply these standards in too rirorous a manner 
may be subject to severe criticism. In situations where a 
school is training personnel who are not meeting standard, 
prading may have to be revised according to the study quoted 
below. 

Validity studies made under circumstances where true 
performance is unknown are tenuous. In addition, school 
operating personnel are placed in a rather difficult 
position in meeting reuirements of quality and quantity. A 
paragraph from the study noted above does little to instill 
confidence in the Navy's studies of test validity. This 
paragraph is quoted as follows : <4 

(Usually the validity coefficients presented for 

two class "A" schools training men for the same 

ratings are of comparable magnitude. However, in 

a few cases there are wide disparities. In these 

cases, for the schools with the much lower validities, 

the grading system might well be reviewed, since 
criterion unreliability is one of the factors which 
often reduce the obtained validities of aptitude tests. ) 

We must, of course, agree that criterion reliability 
is an absolute necessity if we are to obtain reasonably 
correct validity coefficients, but the line of action proposed 


here would only increase the validity coefficient and may not 


correct it at all. In a military complex, a review has many 
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connotations and to single out a scnool and suggest that its 
erading system be revised because validity studies do not 
compare favorably is tantamount to censure. If some 
disparity does exist, an examination is certainly indicated, 
but in checking for criterion reliability, we should be a 
bit more scientific and review the grading systems of all 
schools having the same mission. 

A study by Thorndike and Hagen of more than ten thousand 
men who had previously taken military test batteries was 
completed and published in 1959. Several limitations were 
recognized by the authors of this study in reaching their 
conclusions on the validity of aptitude tests as a predictor 
of job success in civilian occupations. All men studied were 
gainfully employed in various jobs of their own choice. It 
is stated:* 

"In general conclusion, we must say that though it is 

possible that tests of aptitude can show validity in 

long-range predictions of occupational success when 
individuals are employed in jobs in widely different 


parts of the country, our data give little evidence to 
encourage this belief." 


26 


Conclusions and results are succinctly stated as follows:* 





ms 
<9 L. Thorndike and &. Hagen, Ten Thousand Careers 
(New York: John Wiley and Sons, Inc., 1959), p. L&. 
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Our results showed that occupational groups 
differed with respect to personal background variables 
as well as with respect to aptitude test scores. It 
is hard to make a quantitative comparison between 
these two types of information, but our judgement would 
be that items of personal background differentiated 
about as sharply as did scores on aptitude tests. 
Once again, the patterns were, in most instances, 
sensible and in accord with what we would have expected 
by a priori analysis of the occupations. It is possible 
to rationalize most of the significant differences with 
some satisfaction. There were, of course, some 
differences that are difficult to rationalize, but 
these can, in many instances, be thought of as chance 
variations and onces that probably would not hold up 
in another sample. 

With respect to prediction of success within an 
occupation, our conclusions must be quite different. 
As far as we were able to determine from our data, there 
is no convincing evidence that aptitude tests or 
biographical information of the type that was available 
to us can predict degree of success within an 
occupation insofar as this 18 represented in the 
criterion measures that we were able to obtain. This 
would suggest that we should view the long-range 
prediction of occupational success by aptitude tests 
with a good deal of skepticism and take a very restrained 
view as to how much can be accomplished in this direction. 
It is possible that data for a more heterogeneous group 
of avplicants would lead to different conclusions in 
this respect; however, our suspicion is that if the group 
had been more heterogeneous, our increased success would 
have shown up primarily in an increased sharpness of 
differentiation among occupations rather than in improved 
ability to predict within a single occupation. Certainly, 
if we had taken the whole range of abilities in the 
American population, the profile patterns would have 
become very much more clear-cut and the differences 
among occupations would have become a good deal more 
striking. Whether at the same time we would have developed 
some success at predicting degrees of achievement within 
an occupation seems very much open to question. 
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The group involved in this study was limited to 
former Army Air Force Cadets. Tests used were of the seneral 
type previously described as being administered to officer 
personnel for the purpose of classification. It is felt 
that the results, as outlined by Thorndike and Hagen, speak 
for themselves and the subject requires no further comment 
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CHAPTER V 
TOOLS FOR THe FUTURE 


The United States Navy and other military services 
have throughout the history of the United States served 
their country well in both war and peace. However, never 
before have the military services been called upon to 
prepare for instantaneous defense of their nation cn a 
splobal scale. This calling has necessitated a peacetime 
build-up of men anc materials beyond the comprehension of 
our civilien and military leaders in World War II. 

in an effort to minimize the cost of the defense effort, 
thus lessening the military drain on the National Economy, 
civilian and military leaders have concentrated their efforts 
on the spectacular, i.e. areas of high dollar cost. Efforts 
in these areas have certainly given us more tang for the | 
buck, The art of Operational Analysis has been introduced 
and promises to be extremely useful in lowering costs and 
increasing efficiency. All new, as well as old, projects 
are scrutinized to determine if they permit optimum use; 
reduce costs; have sufficiently low costs; increese the 
speed of; are capable of: promote and conserve; are 
competible with; maximize output; and 4 myriad of other 
catch phrases meaning the same thing-=set the most fer the 


Mmititary doller. 
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Billions of dollars have been allocated and spent for 
research and development of weapons and systems which 
are deemed necessary for national defense. Many more billions 
have been spent maintaining and operating these weapons 
and systems. 

In FY 1962, the Navy spent approximately 2.7 billion 
dollars on military manpower. <A very small fraction of this 
amount was allocated to personnel utilization research, ~/ 
we have definitely increased our repertory of tools necessary 
for the future, but, to a large extent, the tools necessary 
for the proper utilization of manpower have yet to be 
fabricated. 

A. BSOBARCH REQUIRED 

Much research has already been accomplished, but our 
knowledge of man is extremely limited. The general educational 
level of a person can be obtained by a simple pencil and paper 
test, but our knowledge of individual capacities must be 
increased and put to use. For example, aptitude is defined 


as:28 A condition or set of characteristics regarded as 





2/Dollar costs for this progrem were not available in 
the Office of the Navy Comproller or in the Bureau cf Naval 
Personnel, It is presumed that information of this type 
would be extremely difficult to obtain with the accounting 
system currently in use. 

28m , J, Hartley and R. E. Hartley, Outside Readings in 
Psychology, (New York: Thomas Y. Crowell Company, 1957), 
Doee ve. 
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symptomatic of an individual's ability to acquire with 
training some (usually specified) knowledse, skill, or set 

of responses, such as ability to speak a lanecuage, to produce 
MUSLCs « 

This is a broed definition and probably fairly accurate 
because it is a wide-angle approach to apptitude. It is to 
be noted that knowledge in 2 specified area is not a necessary 
prereauisite to being trained in that areé. According to 
many learning theorists, learning is accomplished most 
rapidly when there is no interference from already acquired 
knowledge. 

The Navy's test for mechanical aptitude serves to 
illustrate that there may be Little relationship between 
oreviously acquired mechanical experience and é&y aptitude for 
learning mechanical skills. Validity studies for this test 
normally reveal low correlation coefficients because we do 
not know what charecteristics or abilities are reouired to 
learn a mechanical skill. According to the study by Thorndike 
and Hagen, previously cuoted, backrrounds differentiated 
between oceurvations as sharply es did aptitude test scores. 
This then apvears to be an area that reouires considerabie 
basic and applied research. 

It is not felt that reseérch of the type alluded to in 


the vrevious paragravh should be performed within the military 
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establishment inasmuch as personnel who are in ratings, 
specialities or occupations at present are probably not 
representative of a population which seeks its own level in 
society. Most military enlisted billets are filled by 
personnel who were considered trainable in a particular 
speciality at an early stage of their military service by 
virtue of their test scores. Studies of this group have 
vindicated past procedures and will certainly do so in the 
future, but will furnish little usable data. Many militery 
specialities sre, of course, not found in use in the 
civilian economy nor will a military environment be frequently 
found, but these superficial handicaps will for practical 


purposes disappeer when they are carefully examined. 
B, SELECTION FOR TRAINING 


Chapter II briefly outlined the manner in which tests 
are currently used for selectins enlisted personnel for 
training. At thet point it was noted that the Officer 
Classification Battery (OCB) was not enjoying wide use as a 
selection device. It was further shown in Chapter IV that 
the (OCB), in a study by Thorndike and Hagen, may have little 
validity as a predicter of what occupation will he chosen 
by the individual and less validity as a predictor of 


success. 
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The Superintendent, United States Naval Postgraduate 
School, ©? by inference, agrees that the (OCB) is an instrument 
of limited usefulness. In a letter, Ser: 2166 dated 2 Aug 1963 
to the Chief of Naval Personnel, the Superintendent set forth 
his recommended guidelines for the Postgraduate Selection 
Board's use in selecting students for postgraduate study 
during academic year 1964-1965. These recommendations were 
straight-forward and pertinent, but there was no mention of 
the Officer Classification Battery. 

Due to the diverse backgrounds of the several thousand 
officers considered for postgraduate study, some common 
attribute that could be used as a predictor of academic success 
was needed. ‘This was essentially revolved by considering the 
officer's background as reflected in his personnel record on 
file in the Bureau of Naval Personnel. Each officer had on 
file fitness reports from which the Selection Poard could 
determine the level of his past performance, for the most 
part, in non-academic assignments. The Selection Board also 
had available academic transcripts of undergraduate education 
from several hundred colleges and universities, The criterion 
for assiening grades in many of these schools was unknown. 

The direct cost of selection by this method is not insignificant 
and the ovportunity costs can be appalling. 

This is the largest institution of its type in the world 


and its primary mission is the postgraduate education of Naval 
Officers. 
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Economics of Testing 

Tests used by the military services as entrance 
screenins devices in peace~time save the tax-payers from an 
unnecessary burden in two ways. First, monies are not 
wasted in attempting to train personnel who do not have the 
requisite capacities for military service and second the 
total efficiency of the military organization is intreased by 
eliminating the possibility of nonetrainable personnel acting 
as a drag in an otherwise smooth-running organization. 
Entrance standards have been low in the past and perhaps 
will be lower in the future if the military services are 
required to enlist and train the masses of unemployable, 
However, the military services with the exception of the Army 
have been able to screen out most of the untrainables prior 
to enlistment. 

This paper is principally concerned with what happens 
after enlistment or commissioning since costs prior to this 
time are insignificant as far as tests are concerned. Pay 
and allowances with variations for promotions, transfers, etc. 
are relatively fixed and can be roughly considered as sunk 
costs for the duration of an enlistment or tour of active 
duty. 

The extent of testing officer and enlisted personnel 


on active duty essentially depends upon the time available 
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for testing and the costs involved in testing. However, the 
economic benefits to be derived hinge directly upon the 
validity of the tests. If test validity is zero or positive 
with a low degree of confidence, the use of tests is not 
economically feasible because of the expense involved in 
testing. 

The current procedure for selecting Naval Officers for 
rosteraduate education is an example of the inadequacies and 
diseconomies of tests as mental measurement devices. It is 
evidently felt by military authorities that the use of the 
(OCB) as a decision making tool would give rise to more 
wrong decisions than correct decisions. If a test does this, 
it is an economic burden. Several studies have been done by 
Naval Management students on the validity of the mathematical 
and verbal portions of the (OCB) as a predictor of success 
in the Management curriculum. Appendix C illustrates in 
plotted form the results of one such study. It can easily 
be seen that the validity is near zero. These tests may be 
valid as a predictor of success in other areas, but they carn-= 
not be justified economically as a tool for selecting 
management students. 

Ideally, if we have one thousand new inductees and one 
thousand billets to fill, tests of intelligence, aptitude and 


abilities, with perfect validity, would ailow these officers 


ue 





and men to be placed in assignments consistent with their 
qualifications. This in turn would raise efficiency, i.e. 
output per man, and billets could be deleted in direct 
proportion to increased efficiency inasmuch as only a given 
level of output is required or can be economically tolerated 
for defense. A testing program of this magnitude is 
difficult to comprehend and perhaps not realistic when costs 
are considered, but if, through testing and the proper place- 
ment of personnel, we could achieve a one pertent increase in 
military manpower efficiency in FY 1964, a reduction in total 
manpower requirements would save more than 3120,000,000 while 
maintaining the same output. 

The military services are constantly striving to increase 
the effectiveness of their weapons at the lowest possible 
cost. Historically, manpower has been the most effective 
weapon possessed by any nation involved in conflict. Manpower 
must be considered as our most effective weapon in any future 
conflicts, but wars can not be won in the modern age if we 
use our resources in a haphazard manner. Testing assists in 
the proper utilization of human resources and can become a 
more valuable tool in the future. 

Any tool such as testing, can be misused and result in 
diseconomies which are reflected in exorbitant opportunity 


costs. These costs arise in several ways, but the basic 
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misuses occur from treating test results as an absolute 
indicator when in reality they should be considered as a 
sample which does not reflect drive, motivation, interests, 
or even aptitudes clearly. Another misuse which clearly 
results in opportunity cost is to ignore test results when 
they should be used. 

Testing as a tool for the future presupposes that the 
military services wili train personnel of the highest 
calibre in Personnel Management in order that decisions 
involving personnel classification, placement, training, and 
assignment will be made which reflect service needs, 
personal needs on the part of members, and economies in 


management » 
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CHAPTER VI 
CONCLUSIONS AND RECOMMENDATIONS 
A. CONCLUSIONS 


The military services are using paper and pencil tests 
to measure intclligence, aptitudes, and achievements. These 
tests are contributing to the efficient utilization of military 
manpower. Hach of the military services, in their testing 
programs, presuppose unique personnel requirements. This is 
difficult to fathom except for isolated occupations. 

Testing, for the purpose of mental measurement, has not 
reached its maturity and much basic research is required, 

If fact, testing for military use is, at best, in its 
infancy. Efforts to expand the frontiers of knowledge have 
been tenuous and narrow in military testing. Improvements 
have been made in the (BIB) for Naval enlisted personnel, but 
validity studies indicate a need for better instruments. 

Testing in its present state is a sampling device 
which cannot be used effectively without considering backe- 
ground factors, drive, and motivation in the assignment of 
personnel, The consideration of background factors, drive, 
and motivation has not been significant in the placement of 
Naval enlisted personnel while these have been the only 
factors considered when Naval officer personnel are selected 


for advanced or postgraduate training. 
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Many defects in our current testing program Exist 
because we originally made a cursory examination of the 
eccupations for which the tests were created. Tnesexpeaienc, 
required by war does not justify this superficial avproach 
to job analysis during peacetime. 

Test validity studies justify the cost of testing Naval 
enlisted personnel if the studies themselves can be accepted 
as valid, but little is known about the relationship between 
job performance and test scores, although much worthwhile 
information has been gained from studies of the relationship 
between non=performance and test scores. 

Lastly, it is concluded that testing in the military 
services is a necessary and important part of Military 
Personnel Management. 


Bo RECOMMENDATIONS 


The following recommendations are made subject to 
revision as new and/or more reliable data becomes available. 
1. The present military testing program shouid be 
continued. However, those tests not deemed sufficiently 

valid to be used should be discontinued immediately. 
2. Studies should be initiated by the Department of 
Defense to determine if the four military services do 


have unique personnel requirements. 
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3, An office should be established at the Department of 
Defense level to coordinate and evaluate an intensive 
research program, This program should be directed 
toward occupational selection by individuals with an 
advanced goal of success prediction. 

1. A job analysis for each billet should be commenced 
by the military services and coordinated by the 
Department of Defense. 

5. Training programs should be initiated by each military 
service to train all personnel who make personnel 
decisions in the uses and limitations of test scores, 

6. Classification centers of each of the military 
services should be staffed by personnel thoroughly 
trained in eliciting background information from 
individuals being interviewed as well as ascertaining 
their motivations, drives, and ambitions. Centers 
should be staffed with sufficient numbers of such 
personnel to allow a minimum of one hour for each 
interview, Personnel being classified should be given 
a definite or a conditional classificaticn. Personnel 
who have been given a conditional classification should 
be interviewed again, at a classification center, at 
the end of one year and given a definite clessification. 


7. &ach of the military services must develop a 
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performance rating system that will enable reviewing 
authorities to evaluate performence by the degree of. 
job success. 

&, The Department of Defense should request in the 
next military budget monies for menpower utilization 
research. 

9. ‘Tests in current use should be validated as soon as 
job analyses are complete and job performance 
evaluations are available. 

10, The Department of Defense should plan and coordinate 
the entire program as previously outiined in brief and 
a standardized military testing program should be 


developed at this level as soon as economically feasible, 
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APPENDIX A 
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The range of Navy Standard Scores theoretically extend 


from zero to one hundred, but the probability of a score 


being below 20 or above 80 is .13% and scores in the extreme 


ranges is not included 
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APPENDIX C 


1962 Navy Management School Class 


30 X 
en 
H 
wy 
O x 
| 
is 
op xX 
C 
2 
pe X 
= 
ec 
ee eoaex . 
cy 





BOWS 50” 55-2605 65>, 70 


OCB MATH + VERBAL Test scores 
2 


56 


& KIM r} 


seni) loede® srame,enit vail ised 


























We 


) 14 


> 
¥ 


